Overview

Dataset statistics

Number of variables11
Number of observations13131252
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.3 GiB
Average record size in memory190.0 B

Variable types

NUM8
CAT2
BOOL1

Warnings

clv is highly correlated with transaction_value and 1 other fieldsHigh correlation
transaction_value is highly correlated with clv and 1 other fieldsHigh correlation
earned_reward_points is highly correlated with transaction_value and 2 other fieldsHigh correlation
total_reward_points is highly correlated with earned_reward_pointsHigh correlation
referred_friends has 5010003 (38.2%) zeros Zeros

Reproduction

Analysis started2020-11-18 17:25:50.572509
Analysis finished2020-11-18 17:42:30.181968
Duration16 minutes and 39.61 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

customer_id
Real number (ℝ≥0)

Distinct924342
Distinct (%)7.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean746300.5984
Minimum2
Maximum1423208
Zeros0
Zeros (%)0.0%
Memory size100.2 MiB
2020-11-18T11:42:30.833843image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile135089
Q1514503
median770364
Q3999609
95-th percentile1263389
Maximum1423208
Range1423206
Interquartile range (IQR)485106

Descriptive statistics

Standard deviation335268.1639
Coefficient of variation (CV)0.4492401113
Kurtosis-0.6823767032
Mean746300.5984
Median Absolute Deviation (MAD)240714
Skewness-0.2604560407
Sum9.799861226e+12
Variance1.124047417e+11
MonotocityNot monotonic
2020-11-18T11:42:30.986101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
47899745< 0.1%
 
52110545< 0.1%
 
61090645< 0.1%
 
63546245< 0.1%
 
31640545< 0.1%
 
27543345< 0.1%
 
34912545< 0.1%
 
49654145< 0.1%
 
51291745< 0.1%
 
47194545< 0.1%
 
70097245< 0.1%
 
48832145< 0.1%
 
12782545< 0.1%
 
13598145< 0.1%
 
56997745< 0.1%
 
62729345< 0.1%
 
63548145< 0.1%
 
65185745< 0.1%
 
24243845< 0.1%
 
20968645< 0.1%
 
19332645< 0.1%
 
3773045< 0.1%
 
40645245< 0.1%
 
48015245< 0.1%
 
497645< 0.1%
 
Other values (924317)13130127> 99.9%
 
ValueCountFrequency (%) 
245< 0.1%
 
443< 0.1%
 
72< 0.1%
 
1416< 0.1%
 
2045< 0.1%
 
3045< 0.1%
 
4045< 0.1%
 
4345< 0.1%
 
479< 0.1%
 
8945< 0.1%
 
ValueCountFrequency (%) 
14232081< 0.1%
 
14232071< 0.1%
 
14232061< 0.1%
 
14232051< 0.1%
 
14232041< 0.1%
 
14232031< 0.1%
 
14232021< 0.1%
 
14232011< 0.1%
 
14232001< 0.1%
 
14231991< 0.1%
 

month
Categorical

Distinct45
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size100.2 MiB
2020-10-01
 
347964
2020-09-01
 
345254
2020-08-01
 
342678
2020-07-01
 
340267
2020-06-01
 
337663
Other values (40)
11417426 
ValueCountFrequency (%) 
2020-10-013479642.6%
 
2020-09-013452542.6%
 
2020-08-013426782.6%
 
2020-07-013402672.6%
 
2020-06-013376632.6%
 
2020-05-013350652.6%
 
2020-04-013326082.5%
 
2020-03-013302862.5%
 
2020-02-013277552.5%
 
2020-01-013252312.5%
 
2019-12-013227172.5%
 
2019-11-013200252.4%
 
2019-10-013173692.4%
 
2019-09-013150712.4%
 
2019-08-013125882.4%
 
2019-07-013101742.4%
 
2019-06-013077422.3%
 
2019-05-013050002.3%
 
2019-04-013023282.3%
 
2019-03-012999432.3%
 
2019-02-012973482.3%
 
2019-01-012947312.2%
 
2018-12-012920102.2%
 
2018-11-012894812.2%
 
2018-10-012867952.2%
 
Other values (20)519315939.5%
 
2020-11-18T11:42:31.179956image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-18T11:42:31.325038image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length10
Min length10

Overview of Unicode Properties

Unique unicode characters11
Unique unicode categories2 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
04101548631.2%
 
12759957721.0%
 
-2626250420.0%
 
21849751114.1%
 
949027593.7%
 
845175563.4%
 
739092323.0%
 
611675180.9%
 
511569590.9%
 
411466610.9%
 
311367570.9%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number10505001680.0%
 
Dash Punctuation2626250420.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
04101548639.0%
 
12759957726.3%
 
21849751117.6%
 
949027594.7%
 
845175564.3%
 
739092323.7%
 
611675181.1%
 
511569591.1%
 
411466611.1%
 
311367571.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-26262504100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common131312520100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
04101548631.2%
 
12759957721.0%
 
-2626250420.0%
 
21849751114.1%
 
949027593.7%
 
845175563.4%
 
739092323.0%
 
611675180.9%
 
511569590.9%
 
411466610.9%
 
311367570.9%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII131312520100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
04101548631.2%
 
12759957721.0%
 
-2626250420.0%
 
21849751114.1%
 
949027593.7%
 
845175563.4%
 
739092323.0%
 
611675180.9%
 
511569590.9%
 
411466610.9%
 
311367570.9%
 

months_since_joined
Real number (ℝ≥0)

Distinct106
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.72676878
Minimum1
Maximum106
Zeros0
Zeros (%)0.0%
Memory size100.2 MiB
2020-11-18T11:42:31.459218image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q16
median17
Q337
95-th percentile72
Maximum106
Range105
Interquartile range (IQR)31

Descriptive statistics

Standard deviation22.70218206
Coefficient of variation (CV)0.9181216626
Kurtosis0.4530905045
Mean24.72676878
Median Absolute Deviation (MAD)13
Skewness1.104203205
Sum324693432
Variance515.3890704
MonotocityNot monotonic
2020-11-18T11:42:31.623843image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
17025825.4%
 
26441944.9%
 
35867334.5%
 
45260364.0%
 
54712713.6%
 
64266693.2%
 
73915253.0%
 
83631102.8%
 
93399332.6%
 
103203802.4%
 
113032742.3%
 
122882302.2%
 
132743922.1%
 
142620892.0%
 
152507021.9%
 
162402391.8%
 
172306691.8%
 
182214301.7%
 
192128511.6%
 
202048921.6%
 
211973931.5%
 
221903011.4%
 
231835611.4%
 
241771801.3%
 
251711191.3%
 
Other values (81)495049737.7%
 
ValueCountFrequency (%) 
17025825.4%
 
26441944.9%
 
35867334.5%
 
45260364.0%
 
54712713.6%
 
64266693.2%
 
73915253.0%
 
83631102.8%
 
93399332.6%
 
103203802.4%
 
ValueCountFrequency (%) 
106784< 0.1%
 
1051694< 0.1%
 
1042595< 0.1%
 
1033508< 0.1%
 
1024397< 0.1%
 
1015356< 0.1%
 
1006310< 0.1%
 
9972790.1%
 
9882810.1%
 
9792930.1%
 

referred_friends
Real number (ℝ≥0)

ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9683107902
Minimum0
Maximum9
Zeros5010003
Zeros (%)38.2%
Memory size100.2 MiB
2020-11-18T11:42:31.771695image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile3
Maximum9
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation0.98818099
Coefficient of variation (CV)1.020520478
Kurtosis1.056525664
Mean0.9683107902
Median Absolute Deviation (MAD)1
Skewness1.025474779
Sum12715133
Variance0.976501669
MonotocityNot monotonic
2020-11-18T11:42:31.873699image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0501000338.2%
 
1479976236.6%
 
2233131517.8%
 
37598805.8%
 
41865651.4%
 
5366320.3%
 
66095< 0.1%
 
7900< 0.1%
 
889< 0.1%
 
911< 0.1%
 
ValueCountFrequency (%) 
0501000338.2%
 
1479976236.6%
 
2233131517.8%
 
37598805.8%
 
41865651.4%
 
5366320.3%
 
66095< 0.1%
 
7900< 0.1%
 
889< 0.1%
 
911< 0.1%
 
ValueCountFrequency (%) 
911< 0.1%
 
889< 0.1%
 
7900< 0.1%
 
66095< 0.1%
 
5366320.3%
 
41865651.4%
 
37598805.8%
 
2233131517.8%
 
1479976236.6%
 
0501000338.2%
 

transaction_count
Real number (ℝ≥0)

Distinct148
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.67165538
Minimum0
Maximum149
Zeros45158
Zeros (%)0.3%
Memory size100.2 MiB
2020-11-18T11:42:32.008238image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile12
Q120
median26
Q338
95-th percentile64
Maximum149
Range149
Interquartile range (IQR)18

Descriptive statistics

Standard deviation16.32709408
Coefficient of variation (CV)0.5323186467
Kurtosis2.109577431
Mean30.67165538
Median Absolute Deviation (MAD)8
Skewness1.336677752
Sum402757236
Variance266.5740012
MonotocityNot monotonic
2020-11-18T11:42:32.189194image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
215082753.9%
 
225072353.9%
 
234989003.8%
 
204984343.8%
 
244815333.7%
 
194798203.7%
 
254583373.5%
 
184492343.4%
 
264352583.3%
 
174120953.1%
 
274076483.1%
 
283811772.9%
 
163673672.8%
 
293536492.7%
 
303270002.5%
 
153188422.4%
 
313031862.3%
 
322801482.1%
 
142660732.0%
 
332601342.0%
 
342416891.8%
 
352252661.7%
 
132130491.6%
 
362099471.6%
 
371965671.5%
 
Other values (123)405038930.8%
 
ValueCountFrequency (%) 
0451580.3%
 
1160550.1%
 
2163460.1%
 
3178870.1%
 
4215390.2%
 
5257710.2%
 
6324660.2%
 
7409220.3%
 
8527010.4%
 
9694380.5%
 
ValueCountFrequency (%) 
1492< 0.1%
 
1484< 0.1%
 
1471< 0.1%
 
1461< 0.1%
 
1451< 0.1%
 
1431< 0.1%
 
1412< 0.1%
 
1402< 0.1%
 
1392< 0.1%
 
1383< 0.1%
 

transaction_value
Real number (ℝ≥0)

HIGH CORRELATION

Distinct12748860
Distinct (%)97.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean169.7882566
Minimum0
Maximum2037.223285
Zeros45158
Zeros (%)0.3%
Memory size100.2 MiB
2020-11-18T11:42:43.987187image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile36.8957677
Q159.18877707
median89.30977162
Q3192.5665371
95-th percentile593.2800142
Maximum2037.223285
Range2037.223285
Interquartile range (IQR)133.37776

Descriptive statistics

Standard deviation198.0616219
Coefficient of variation (CV)1.166521324
Kurtosis9.355484017
Mean169.7882566
Median Absolute Deviation (MAD)40.46852149
Skewness2.775731504
Sum2229532384
Variance39228.40606
MonotocityNot monotonic
2020-11-18T11:42:44.147352image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0451580.3%
 
4078160.1%
 
3777130.1%
 
4177030.1%
 
4276850.1%
 
4376800.1%
 
3976430.1%
 
3875210.1%
 
4473450.1%
 
4572940.1%
 
3672100.1%
 
4671960.1%
 
3571580.1%
 
4768930.1%
 
3468220.1%
 
4867830.1%
 
4965890.1%
 
336452< 0.1%
 
506358< 0.1%
 
516160< 0.1%
 
326013< 0.1%
 
525942< 0.1%
 
535752< 0.1%
 
315606< 0.1%
 
545554< 0.1%
 
Other values (12748835)1292120698.4%
 
ValueCountFrequency (%) 
0451580.3%
 
14032< 0.1%
 
1.0000116271< 0.1%
 
1.0008886591< 0.1%
 
1.0015405671< 0.1%
 
1.0018864871< 0.1%
 
1.001925941< 0.1%
 
1.002114091< 0.1%
 
1.0025047551< 0.1%
 
1.0025522631< 0.1%
 
ValueCountFrequency (%) 
2037.2232851< 0.1%
 
2023.787531< 0.1%
 
2008.6243791< 0.1%
 
2008.1763421< 0.1%
 
2005.913621< 0.1%
 
1994.4902271< 0.1%
 
1991.6808731< 0.1%
 
1991.217741< 0.1%
 
1987.9458151< 0.1%
 
1981.538221< 0.1%
 

clv
Real number (ℝ≥0)

HIGH CORRELATION

Distinct12761597
Distinct (%)97.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4308.193435
Minimum0
Maximum67896.16483
Zeros45158
Zeros (%)0.3%
Memory size100.2 MiB
2020-11-18T11:42:56.000827image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile235.9424685
Q1445.9564406
median1272.689167
Q34899.340984
95-th percentile19322.88185
Maximum67896.16483
Range67896.16483
Interquartile range (IQR)4453.384544

Descriptive statistics

Standard deviation6905.088207
Coefficient of variation (CV)1.602780449
Kurtosis9.091475342
Mean4308.193435
Median Absolute Deviation (MAD)974.3590553
Skewness2.776506826
Sum5.657197366e+10
Variance47680243.14
MonotocityNot monotonic
2020-11-18T11:42:56.182217image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0451580.3%
 
613.3135658365< 0.1%
 
642.304537357< 0.1%
 
830.8478751356< 0.1%
 
768.5342845354< 0.1%
 
690.1756909350< 0.1%
 
861.7532421347< 0.1%
 
739.234615344< 0.1%
 
567.4826704344< 0.1%
 
726.9918908342< 0.1%
 
840.209411340< 0.1%
 
709.3472378338< 0.1%
 
510.9961501337< 0.1%
 
747.6903318335< 0.1%
 
914.0621986335< 0.1%
 
759.213929334< 0.1%
 
595.790321334< 0.1%
 
699.2759872332< 0.1%
 
651.8325969332< 0.1%
 
789.3054814332< 0.1%
 
883.2970731331< 0.1%
 
648.3600552331< 0.1%
 
747.7630876329< 0.1%
 
523.3129775328< 0.1%
 
779.1932429328< 0.1%
 
Other values (12761572)1307793999.6%
 
ValueCountFrequency (%) 
0451580.3%
 
6.609117471< 0.1%
 
7.7329317124< 0.1%
 
8.3426637114< 0.1%
 
8.98360144618< 0.1%
 
9.65475669122< 0.1%
 
10.3547306329< 0.1%
 
10.428329641< 0.1%
 
10.566679281< 0.1%
 
10.662701171< 0.1%
 
ValueCountFrequency (%) 
67896.164831< 0.1%
 
67447.202611< 0.1%
 
66939.136851< 0.1%
 
66928.09421< 0.1%
 
66850.224551< 0.1%
 
66464.805281< 0.1%
 
66374.479921< 0.1%
 
66361.742221< 0.1%
 
66253.855291< 0.1%
 
66034.910391< 0.1%
 

total_reward_points
Real number (ℝ≥0)

HIGH CORRELATION

Distinct13130088
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7889.968914
Minimum0
Maximum596106.4354
Zeros663
Zeros (%)< 0.1%
Memory size100.2 MiB
2020-11-18T11:43:07.886719image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile59.99981165
Q1299.3680261
median1223.010107
Q35595.998816
95-th percentile39473.54535
Maximum596106.4354
Range596106.4354
Interquartile range (IQR)5296.63079

Descriptive statistics

Standard deviation20170.91154
Coefficient of variation (CV)2.55652611
Kurtosis49.23429765
Mean7889.968914
Median Absolute Deviation (MAD)1110.968521
Skewness5.792566483
Sum1.036051701e+11
Variance406865672.3
MonotocityNot monotonic
2020-11-18T11:43:08.044781image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0663< 0.1%
 
7.99999999222< 0.1%
 
6.99999999315< 0.1%
 
3214< 0.1%
 
3611< 0.1%
 
3810< 0.1%
 
699< 0.1%
 
9.999999998< 0.1%
 
408< 0.1%
 
8.9999999918< 0.1%
 
307< 0.1%
 
327< 0.1%
 
13.999999986< 0.1%
 
816< 0.1%
 
346< 0.1%
 
726< 0.1%
 
406< 0.1%
 
16.999999975< 0.1%
 
15.999999985< 0.1%
 
425< 0.1%
 
285< 0.1%
 
665< 0.1%
 
364< 0.1%
 
71.999999934< 0.1%
 
140.99999994< 0.1%
 
Other values (13130063)13130403> 99.9%
 
ValueCountFrequency (%) 
0663< 0.1%
 
0.0013315100221< 0.1%
 
0.0045827754181< 0.1%
 
0.007352357921< 0.1%
 
0.0092733401731< 0.1%
 
0.011395666391< 0.1%
 
0.015398477281< 0.1%
 
0.017021003381< 0.1%
 
0.018710161561< 0.1%
 
0.023410734771< 0.1%
 
ValueCountFrequency (%) 
596106.43541< 0.1%
 
587656.18961< 0.1%
 
578674.19181< 0.1%
 
569285.08241< 0.1%
 
559229.04611< 0.1%
 
552016.97581< 0.1%
 
549022.27061< 0.1%
 
543476.97921< 0.1%
 
539503.97641< 0.1%
 
538662.69311< 0.1%
 

earned_reward_points
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3249
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean365.3978279
Minimum0
Maximum13534
Zeros115064
Zeros (%)0.9%
Memory size100.2 MiB
2020-11-18T11:43:08.205117image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile15
Q150
median108
Q3318
95-th percentile1679
Maximum13534
Range13534
Interquartile range (IQR)268

Descriptive statistics

Standard deviation726.1654496
Coefficient of variation (CV)1.987328315
Kurtosis28.1180625
Mean365.3978279
Median Absolute Deviation (MAD)74
Skewness4.515913938
Sum4798130958
Variance527316.2601
MonotocityNot monotonic
2020-11-18T11:43:08.388105image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
362948122.2%
 
722652802.0%
 
342643882.0%
 
382589622.0%
 
322404991.8%
 
662375161.8%
 
692342401.8%
 
402224191.7%
 
752028921.5%
 
301998101.5%
 
421916441.5%
 
631907861.5%
 
601883111.4%
 
781663231.3%
 
841620571.2%
 
1201459341.1%
 
281274681.0%
 
1081225360.9%
 
811209520.9%
 
961165540.9%
 
441154190.9%
 
01150640.9%
 
1001146420.9%
 
481090610.8%
 
571081970.8%
 
Other values (3224)861548665.6%
 
ValueCountFrequency (%) 
01150640.9%
 
23467< 0.1%
 
370130.1%
 
4101910.1%
 
5120770.1%
 
6159950.1%
 
7180690.1%
 
8274200.2%
 
9325170.2%
 
10508690.4%
 
ValueCountFrequency (%) 
135341< 0.1%
 
134331< 0.1%
 
132003< 0.1%
 
130681< 0.1%
 
129693< 0.1%
 
128702< 0.1%
 
128385< 0.1%
 
1274012< 0.1%
 
126426< 0.1%
 
126101< 0.1%
 

cluster
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size100.2 MiB
B
6644458 
C
3927959 
A
2558835 
ValueCountFrequency (%) 
B664445850.6%
 
C392795929.9%
 
A255883519.5%
 
2020-11-18T11:43:08.537362image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-18T11:43:09.044997image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:43:09.164639image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters3
Unique unicode categories1 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
B664445850.6%
 
C392795929.9%
 
A255883519.5%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter13131252100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
B664445850.6%
 
C392795929.9%
 
A255883519.5%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin13131252100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
B664445850.6%
 
C392795929.9%
 
A255883519.5%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII13131252100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
B664445850.6%
 
C392795929.9%
 
A255883519.5%
 

churned
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size12.5 MiB
False
12540477 
True
 
590775
ValueCountFrequency (%) 
False1254047795.5%
 
True5907754.5%
 
2020-11-18T11:43:09.243992image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Interactions

2020-11-18T11:35:09.873885image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:35:14.689960image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:35:19.509832image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:35:24.635834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:35:29.686791image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:35:34.608875image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:35:39.845757image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:35:45.715420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:35:50.545303image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:35:55.311723image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:36:00.087479image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:36:05.190802image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:36:10.361235image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:36:16.996589image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:36:24.453901image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:36:29.529899image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:36:34.763023image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:36:39.829583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:36:45.310492image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:36:50.763507image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:36:56.301149image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:37:01.525911image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:37:07.830621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:37:14.064697image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:37:19.360104image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:37:24.586889image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:37:29.860656image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:37:35.398661image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:37:41.101364image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:37:46.997025image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:37:53.042998image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:37:59.017806image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:38:04.507259image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:38:10.082129image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:38:16.379693image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:38:22.932917image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:38:29.327711image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:38:35.107296image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:38:41.554158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:38:47.526701image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:38:53.021633image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:38:58.685745image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:39:03.770819image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:39:09.174820image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:39:14.521900image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:39:19.836828image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:39:24.883915image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:39:29.999283image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:39:35.117859image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:39:40.618607image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:39:46.412849image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:39:52.542131image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:39:58.672982image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:40:04.364628image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:40:09.732702image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:40:14.857839image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:40:20.042133image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:40:25.071014image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:40:30.089599image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:40:35.549658image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:40:40.903896image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:40:46.133205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:40:51.255765image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:40:56.336142image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-11-18T11:43:09.336854image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-11-18T11:43:09.596134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-11-18T11:43:09.799352image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-11-18T11:43:10.019890image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-11-18T11:43:10.221289image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-11-18T11:41:29.419701image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-18T11:41:41.167822image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

customer_idmonthmonths_since_joinedreferred_friendstransaction_counttransaction_valueclvtotal_reward_pointsearned_reward_pointsclusterchurned
022017-02-0162158.0743.78502024461.87613441822.9991772146.0BFalse
142017-02-0162117.0185.1704856089.9552099374.999673153.0BFalse
272017-02-0162171.071.0000002335.07419314302.040625213.0BFalse
3142017-02-0162025.0187.9861196182.5568153200.996803225.0CFalse
4202017-02-0162063.0342.68299411270.28472331744.9972341071.0BFalse
5302017-02-0162048.0247.5725858142.25849816286.604891576.0CFalse
6402017-02-0162020.082.0028072696.9385772939.99705080.0CFalse
7432017-02-0162215.0174.1132455726.3006275386.062256120.0BFalse
8472017-02-0162369.0400.67966413177.70060659622.6325741380.0BFalse
9892017-02-0162237.0189.3681776228.0104818602.997289333.0AFalse

Last rows

customer_idmonthmonths_since_joinedreferred_friendstransaction_counttransaction_valueclvtotal_reward_pointsearned_reward_pointsclusterchurned
1313124214231992020-10-011027.072.919534345.64347379.22276281.0BFalse
1313124314232002020-10-011221.059.631172282.65574841.99469342.0BFalse
1313124414232012020-10-011120.054.684560259.20847539.99883540.0CFalse
1313124514232022020-10-011017.047.206802223.76340133.99999534.0CFalse
1313124614232032020-10-011113.033.609168159.30970612.48987713.0BTrue
1313124714232042020-10-011026.072.892389345.51480677.98731278.0BTrue
1313124814232052020-10-011019.052.500548248.85611437.98842838.0BFalse
1313124914232062020-10-011219.055.659544263.82996737.99996238.0CFalse
1313125014232072020-10-011122.063.473847300.87028765.99994966.0BFalse
1313125114232082020-10-011222.065.412040310.05745065.99941566.0CFalse